Maria Nakhoul, Eliana Marostica, and Sunny Mahesh
We will be using 3 data sets freely available online from the CDC
setwd("../")
iddr <- read_csv("data/Impaired_Driving_Death_Rate__by_Age_and_Gender__2012___2014__All_States.csv")
## Parsed with column specification:
## cols(
## State = col_character(),
## Location = col_character(),
## `All Ages, 2012` = col_double(),
## `All Ages, 2014` = col_double(),
## `Ages 0-20, 2012` = col_double(),
## `Ages 0-20, 2014` = col_double(),
## `Ages 21-34, 2012` = col_double(),
## `Ages 21-34, 2014` = col_double(),
## `Ages 35+, 2012` = col_double(),
## `Ages 35+, 2014` = col_double(),
## `Male, 2012` = col_double(),
## `Male, 2014` = col_double(),
## `Female, 2012` = col_double(),
## `Female, 2014` = col_double()
## )
ocdr <- read_csv("data/Motor_Vehicle_Occupant_Death_Rate__by_Age_and_Gender__2012___2014__All_States.csv")
## Parsed with column specification:
## cols(
## State = col_character(),
## Location = col_character(),
## `All Ages, 2012` = col_double(),
## `All Ages, 2014` = col_double(),
## `Age 0-20, 2012` = col_double(),
## `Age 0-20, 2014` = col_double(),
## `Age 21-34, 2012` = col_double(),
## `Age 21-34, 2014` = col_double(),
## `Age 35-54, 2012` = col_double(),
## `Age 35-54, 2014` = col_double(),
## `Age 55+, 2012` = col_double(),
## `Age 55+, 2014` = col_double(),
## `Male, 2012` = col_double(),
## `Male, 2014` = col_double(),
## `Female, 2012` = col_double(),
## `Female, 2014` = col_double()
## )
sebt <- read_csv("data/Percentage_of_Drivers_and_Front_Seat_Passengers_Wearing_Seat_Belts__2012___2014__All_States.csv")
## Parsed with column specification:
## cols(
## State = col_character(),
## `2012` = col_double(),
## `2014` = col_double(),
## Location = col_character()
## )
Summarize the variables, data types (temporal, networks, multivariate matrices, etc.), and key statistics (# of elements, # of attributes, # of timepoints, etc.) of your data set.
The key variables here are state, location (of the state), age group (all,0-20, 21-34, 35+), year (2012,2014), sex, and death rate. The data type here is numeric - these are death rates. There are two timepoints: 2012 and 2014. These death rates are available for each of the 50 states plus Washington D.C. and the United Sates as a whole.
iddr %>%
summary()
## State Location All Ages, 2012 All Ages, 2014
## Length:52 Length:52 Min. : 1.200 Min. :1.600
## Class :character Class :character 1st Qu.: 2.500 1st Qu.:2.500
## Mode :character Mode :character Median : 3.600 Median :3.200
## Mean : 3.916 Mean :3.624
## 3rd Qu.: 4.800 3rd Qu.:4.600
## Max. :11.300 Max. :8.200
## NA's :2 NA's :3
## Ages 0-20, 2012 Ages 0-20, 2014 Ages 21-34, 2012 Ages 21-34, 2014
## Min. :0.600 Min. :0.700 Min. : 3.300 Min. : 3.000
## 1st Qu.:1.200 1st Qu.:0.850 1st Qu.: 5.800 1st Qu.: 5.200
## Median :1.600 Median :1.200 Median : 6.850 Median : 6.300
## Mean :1.621 Mean :1.411 Mean : 7.971 Mean : 7.173
## 3rd Qu.:1.950 3rd Qu.:1.950 3rd Qu.: 9.400 3rd Qu.: 8.200
## Max. :2.800 Max. :2.800 Max. :21.400 Max. :15.400
## NA's :28 NA's :34 NA's :10 NA's :11
## Ages 35+, 2012 Ages 35+, 2014 Male, 2012 Male, 2014
## Min. : 1.400 Min. :1.500 Min. : 1.700 Min. : 2.400
## 1st Qu.: 2.200 1st Qu.:2.350 1st Qu.: 4.100 1st Qu.: 3.700
## Median : 3.200 Median :3.100 Median : 5.700 Median : 4.900
## Mean : 3.585 Mean :3.537 Mean : 6.271 Mean : 5.658
## 3rd Qu.: 4.375 3rd Qu.:4.475 3rd Qu.: 7.525 3rd Qu.: 7.125
## Max. :12.000 Max. :8.100 Max. :17.400 Max. :13.100
## NA's :6 NA's :6 NA's :4 NA's :4
## Female, 2012 Female, 2014
## Min. :0.700 Min. :0.8
## 1st Qu.:1.150 1st Qu.:1.1
## Median :1.500 Median :1.3
## Mean :1.663 Mean :1.6
## 3rd Qu.:1.850 3rd Qu.:1.8
## Max. :4.000 Max. :4.3
## NA's :17 NA's :17
ocdr %>%
summary()
## State Location All Ages, 2012 All Ages, 2014
## Length:53 Length:53 Min. : 2.900 Min. : 2.300
## Class :character Class :character 1st Qu.: 5.350 1st Qu.: 5.250
## Mode :character Mode :character Median : 7.400 Median : 6.800
## Mean : 8.516 Mean : 8.104
## 3rd Qu.:11.050 3rd Qu.:11.100
## Max. :20.200 Max. :21.000
## NA's :2 NA's :2
## Age 0-20, 2012 Age 0-20, 2014 Age 21-34, 2012 Age 21-34, 2014
## Min. : 1.700 Min. : 1.200 Min. : 4.10 Min. : 3.90
## 1st Qu.: 3.400 1st Qu.: 3.100 1st Qu.: 8.85 1st Qu.: 8.40
## Median : 4.800 Median : 3.700 Median :12.00 Median :10.90
## Mean : 5.114 Mean : 4.466 Mean :13.64 Mean :12.67
## 3rd Qu.: 6.650 3rd Qu.: 5.800 3rd Qu.:17.95 3rd Qu.:16.20
## Max. :11.000 Max. :10.200 Max. :29.60 Max. :37.50
## NA's :10 NA's :12 NA's :6 NA's :6
## Age 35-54, 2012 Age 35-54, 2014 Age 55+, 2012 Age 55+, 2014
## Min. : 2.200 Min. : 2.600 Min. : 3.900 Min. : 3.500
## 1st Qu.: 4.800 1st Qu.: 5.600 1st Qu.: 6.975 1st Qu.: 6.700
## Median : 7.600 Median : 8.100 Median : 9.200 Median : 8.750
## Mean : 8.922 Mean : 8.624 Mean : 9.800 Mean : 9.393
## 3rd Qu.:12.300 3rd Qu.:11.600 3rd Qu.:12.025 3rd Qu.:11.425
## Max. :27.000 Max. :21.000 Max. :20.700 Max. :19.700
## NA's :8 NA's :8 NA's :7 NA's :7
## Male, 2012 Male, 2014 Female, 2012 Female, 2014
## Min. : 4.10 Min. : 3.80 Min. : 1.700 Min. : 1.500
## 1st Qu.: 6.40 1st Qu.: 6.90 1st Qu.: 4.175 1st Qu.: 4.100
## Median :10.20 Median : 9.70 Median : 5.400 Median : 5.000
## Mean :11.35 Mean :11.09 Mean : 5.852 Mean : 5.702
## 3rd Qu.:15.10 3rd Qu.:15.00 3rd Qu.: 6.950 3rd Qu.: 7.400
## Max. :29.30 Max. :27.70 Max. :12.900 Max. :14.300
## NA's :2 NA's :4 NA's :5 NA's :6
There are two time points for the OCDR data 2012 and 2014. There are 2 main attributes sex and age. There are 52 elements (50 states, DC, and USA as a whole). All of the data is numeric data, where the values are the number of deaths for that specific category.
sebt %>%
summary()
## State 2012 2014 Location
## Length:51 Min. :67.00 Min. :69.00 Length:51
## Class :character 1st Qu.:80.00 1st Qu.:82.50 Class :character
## Mode :character Median :84.00 Median :87.00 Mode :character
## Mean :85.18 Mean :86.57
## 3rd Qu.:91.00 3rd Qu.:92.00
## Max. :97.00 Max. :98.00
indicies_max_2012<-which(sebt$`2012`==max(sebt$`2012`))
sebt[indicies_max_2012,]
## # A tibble: 2 x 4
## State `2012` `2014` Location
## <chr> <dbl> <dbl> <chr>
## 1 Oregon 97 98 "Oregon\n(44.567912, -120.156945)"
## 2 Washington 97 95 "Washington\n(47.517368, -120.467672)"
indicies_min_2012<-which(sebt$`2012`==min(sebt$`2012`))
sebt[indicies_min_2012,]
## # A tibble: 1 x 4
## State `2012` `2014` Location
## <chr> <dbl> <dbl> <chr>
## 1 South Dakota 67 69 "South Dakota\n(44.35371, -100.373709)"
indicies_max_2014<-which(sebt$`2014`==max(sebt$`2014`))
sebt[indicies_max_2014,]
## # A tibble: 1 x 4
## State `2012` `2014` Location
## <chr> <dbl> <dbl> <chr>
## 1 Oregon 97 98 "Oregon\n(44.567912, -120.156945)"
indicies_min_2014<-which(sebt$`2014`==min(sebt$`2014`))
sebt[indicies_min_2014,]
## # A tibble: 1 x 4
## State `2012` `2014` Location
## <chr> <dbl> <dbl> <chr>
## 1 South Dakota 67 69 "South Dakota\n(44.35371, -100.373709)"
In 2012, the state with the maximum number of drivers and front seat passangers wearing a seat belt are Washington and Oregon, while the state with the minimum number is South Dakota.
In 2014, the state with the maximum number of drivers and front seat passangers wearing a seat belt is Oregon, while the state with the minimum number is South Dakota.
Many of our sketches were done on whiteboards, but we have copies of our 5 design sheets below.
5 Design Sheets #1
5 Design Sheets #2
5 Design Sheets #3
5 Design Sheets #4
5 Design Sheets #5
*Edit 5/9/19: The 5 design sheets were made prior to the addition of the temporal dataset.
We have now added temporal data to our datasets. An updated, comprehensive explanation of our datasets is below.
The data we are using for this dataset consists of several parts. All the parts fall under motor vehicle induced deaths, for 2 different sources. The first source is the CDC. The data obtained from the CDC are 3 datasets, Impaired Driving Death Rate by Age and Gender 2012 & 2014 All States, Passengers Wearing Seat Belts 2012 & 2014 All States, Motor Vehicle Occupant Death Rate by Age and Gender 2012 & 2014 All States. Datasets are divided up by all the states and death rates divided by age in the following categories: All Ages, Age 0-20, Age 21-34, Age 35-54, Age 55+, as well as by gender, Male or Female, for the years 2012 and 2014. Impaired driving death rates contains death rate data calculated per state per 100,00 population of individuals who had BAC =>0.08%1. Passengers wearing seat belts contained the percentage of seat belt wearing individuals. The data was collected from the National Occupant Protection Use Survey (NOPUS)2. Motor vehicle occupant death rate contained death rage by age or gender per 100,000 population. Data was collected by Fatality Analysis Reporting System (FARS) in 2012 and by National Highway Traffic Safety Administration’s (NHTSA) and Fatality Analysis Reporting System (FARS)3.
Since we didn’t have enough temporal data, we used another source to obtain more data which was, the Insurance Institute for Highway Safety Highway Loss Data Institute. We were able to use this data since it was collected from the same source as the data in the CDC datasets, FARS4. From this website we collected the fatal crash totals which contained all states, number of deaths, and deaths per 100,000 population. The temporal problem was fixed because we were able to obtain the same information for the years 2005 to 2017. Deaths by road user which broke down the total deaths in the previous dataset into the motor vehicle crash death per state, and if the occupant was in a car, pickup and SUV, large truck, motorcyclist, pedestrian, or a bicyclist. Restrain use was also obtained in order to get the percentage of observed seat belt use per state.
Our previous Visualization Tasks and Requirements:
The first and second requirement points would stay the same. The data is of interest to the same audience, and the same types of information are derived from the 2 datasets since they were collected by the same source (FARS) initially. Our third requirement has changed though due to more time points in our data.
New Data Summary Statistics
#install.packages("rio")
library(rio)
setwd("../")
fatal_car_crashes<-import_list("data/fatal car_crash.xlsx",setclass = "tbl",rbind = TRUE)
deaths_by_road_users<-import_list("data/Deaths by road users.xlsx",setclass="tbl",rbind=T)
## New names:
## * `Car occupants ` -> `Car occupants ..2`
## * `Car occupants ` -> `Car occupants ..3`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..4`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..5`
## * `Large truck occupants ` -> `Large truck occupants ..6`
## * … and 9 more
## New names:
## * `Car occupants ` -> `Car occupants ..2`
## * `Car occupants ` -> `Car occupants ..3`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..4`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..5`
## * `Large truck occupants ` -> `Large truck occupants ..6`
## * … and 9 more
## New names:
## * `Car occupants ` -> `Car occupants ..2`
## * `Car occupants ` -> `Car occupants ..3`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..4`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..5`
## * `Large truck occupants ` -> `Large truck occupants ..6`
## * … and 9 more
## New names:
## * `Car occupants ` -> `Car occupants ..2`
## * `Car occupants ` -> `Car occupants ..3`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..4`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..5`
## * `Large truck occupants ` -> `Large truck occupants ..6`
## * … and 9 more
## New names:
## * `Car occupants ` -> `Car occupants ..2`
## * `Car occupants ` -> `Car occupants ..3`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..4`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..5`
## * `Large truck occupants ` -> `Large truck occupants ..6`
## * … and 9 more
## New names:
## * `Car occupants ` -> `Car occupants ..2`
## * `Car occupants ` -> `Car occupants ..3`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..4`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..5`
## * `Large truck occupants ` -> `Large truck occupants ..6`
## * … and 9 more
## New names:
## * `Car occupants ` -> `Car occupants ..2`
## * `Car occupants ` -> `Car occupants ..3`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..4`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..5`
## * `Large truck occupants ` -> `Large truck occupants ..6`
## * … and 9 more
## New names:
## * `Car occupants ` -> `Car occupants ..2`
## * `Car occupants ` -> `Car occupants ..3`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..4`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..5`
## * `Large truck occupants ` -> `Large truck occupants ..6`
## * … and 9 more
## New names:
## * `Car occupants ` -> `Car occupants ..2`
## * `Car occupants ` -> `Car occupants ..3`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..4`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..5`
## * `Large truck occupants ` -> `Large truck occupants ..6`
## * … and 9 more
## New names:
## * `Car occupants ` -> `Car occupants ..2`
## * `Car occupants ` -> `Car occupants ..3`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..4`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..5`
## * `Large truck occupants ` -> `Large truck occupants ..6`
## * … and 9 more
## New names:
## * `Car occupants ` -> `Car occupants ..2`
## * `Car occupants ` -> `Car occupants ..3`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..4`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..5`
## * `Large truck occupants ` -> `Large truck occupants ..6`
## * … and 9 more
## New names:
## * `Car occupants ` -> `Car occupants ..2`
## * `Car occupants ` -> `Car occupants ..3`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..4`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..5`
## * `Large truck occupants ` -> `Large truck occupants ..6`
## * … and 9 more
## New names:
## * `Car occupants ` -> `Car occupants ..2`
## * `Car occupants ` -> `Car occupants ..3`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..4`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..5`
## * `Large truck occupants ` -> `Large truck occupants ..6`
## * … and 9 more
indicies=c()
for( i in 1:13){
indicies[i]=52*i
}
deaths_car_crashes=fatal_car_crashes[-indicies,]
line_graph<-fatal_car_crashes[indicies,]
colnames(line_graph)=c("State","Population","Deaths","Deaths_per_100000_Population","Year")
line_graph$Year=c("2017","2016","2015","2014","2013","2012","2011","2010","2009","2008","2007","2006","2005")
na_indicies=which(is.na(deaths_by_road_users$State))
deaths_by_road_users=deaths_by_road_users[-na_indicies,]
road_users_deaths=deaths_by_road_users[-indicies,]
year=c(rep(2017,51),rep(2016,51),rep(2015,51),rep(2014,51),rep(2013,51),rep(2012,51),rep(2011,51),rep(2010,51),rep(2009,51),rep(2008,51),rep(2007,51),rep(2006,51),rep(2005,51))
deaths_car_crashes=deaths_car_crashes[,1:4]
deaths_car_crashes$Year=year
road_users_deaths_column_names=c("State","Car_Occupant_Death_Number","Car_Occupant_Death_Percent","Pickup_and_SUV_Occupant_Death_Number","Pickup_and_SUV_Occupant_Death_Percent","Large_Truck_Occupant_Death_Number","Large_Truck_Occupant_Death_Percent","Motorcyclists_Occupant_Death_Number","Motorcyclists_Occupant_Death_Percent","Pedestrians_Occupant_Death_Number","Pedestrians_Occupant_Death_Percent","Bicyclists_Occupant_Death_Number","Bicyclists_Occupant_Death_Percent","Total_Occupant_Death_Number","Total_Occupant_Death_Percent","Year")
colnames(road_users_deaths)=road_users_deaths_column_names
road_users_deaths$Year=year
head(road_users_deaths%>%group_by(State)%>%summarize(n=as.integer(max(Car_Occupant_Death_Number)))%>%arrange(desc(n)))
## # A tibble: 6 x 2
## State n
## <chr> <int>
## 1 California 1954
## 2 Texas 1375
## 3 Texas 1238
## 4 California 994
## 5 Florida 931
## 6 Florida 924
head(road_users_deaths%>%group_by(State)%>%summarize(n=as.integer(max(Pickup_and_SUV_Occupant_Death_Number)))%>%arrange(desc(n)))
## # A tibble: 6 x 2
## State n
## <chr> <int>
## 1 Texas 1195
## 2 Texas 975
## 3 California 870
## 4 Florida 760
## 5 California 642
## 6 Florida 611
head(road_users_deaths%>%group_by(State)%>%summarize(n=as.integer(max(Large_Truck_Occupant_Death_Number)))%>%arrange(desc(n)))
## # A tibble: 6 x 2
## State n
## <chr> <int>
## 1 Texas 93
## 2 Texas 81
## 3 California 47
## 4 Florida 46
## 5 Georgia 44
## 6 California 42
head(road_users_deaths%>%group_by(State)%>%summarize(n=as.integer(max(Motorcyclists_Occupant_Death_Number)))%>%arrange(desc(n)))
## # A tibble: 6 x 2
## State n
## <chr> <int>
## 1 Florida 590
## 2 California 548
## 3 Florida 546
## 4 California 535
## 5 Texas 512
## 6 Texas 490
head(road_users_deaths%>%group_by(State)%>%summarize(n=as.integer(max(Pedestrians_Occupant_Death_Number)))%>%arrange(desc(n)))
## # A tibble: 6 x 2
## State n
## <chr> <int>
## 1 California 867
## 2 California 742
## 3 Texas 672
## 4 Florida 654
## 5 Florida 576
## 6 Texas 419
head(road_users_deaths%>%group_by(State)%>%summarize(n=as.integer(max(Bicyclists_Occupant_Death_Number)))%>%arrange(desc(n)))
## # A tibble: 6 x 2
## State n
## <chr> <int>
## 1 Florida 107
## 2 California 99
## 3 California 99
## 4 Florida 83
## 5 Texas 65
## 6 New York 57
head(road_users_deaths%>%group_by(State)%>%summarize(n=as.integer(max(Total_Occupant_Death_Number)))%>%arrange(desc(n)))
## # A tibble: 6 x 2
## State n
## <chr> <int>
## 1 California 4329
## 2 Texas 3776
## 3 California 3623
## 4 Florida 3543
## 5 Texas 3504
## 6 Florida 3174
summary(road_users_deaths)
## State Car_Occupant_Death_Number Car_Occupant_Death_Percent
## Length:663 Length:663 Length:663
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
## Pickup_and_SUV_Occupant_Death_Number
## Length:663
## Class :character
## Mode :character
##
##
##
## Pickup_and_SUV_Occupant_Death_Percent Large_Truck_Occupant_Death_Number
## Length:663 Length:663
## Class :character Class :character
## Mode :character Mode :character
##
##
##
## Large_Truck_Occupant_Death_Percent Motorcyclists_Occupant_Death_Number
## Length:663 Length:663
## Class :character Class :character
## Mode :character Mode :character
##
##
##
## Motorcyclists_Occupant_Death_Percent Pedestrians_Occupant_Death_Number
## Length:663 Length:663
## Class :character Class :character
## Mode :character Mode :character
##
##
##
## Pedestrians_Occupant_Death_Percent Bicyclists_Occupant_Death_Number
## Length:663 Length:663
## Class :character Class :character
## Mode :character Mode :character
##
##
##
## Bicyclists_Occupant_Death_Percent Total_Occupant_Death_Number
## Length:663 Length:663
## Class :character Class :character
## Mode :character Mode :character
##
##
##
## Total_Occupant_Death_Percent Year
## Length:663 Min. :2005
## Class :character 1st Qu.:2008
## Mode :character Median :2011
## Mean :2011
## 3rd Qu.:2014
## Max. :2017
Car Occupant Death Number seems to show the most deaths over the years contribute largely from Texas, California and Florida.
colnames(deaths_car_crashes)=c("State","Population","Deaths","Deaths_per_100000_Population","Year")
head(deaths_car_crashes%>%group_by(State)%>%summarize(n=as.integer(max(Deaths_per_100000_Population)))%>%arrange(desc(n)))
## # A tibble: 6 x 2
## State n
## <chr> <int>
## 1 Wyoming 37
## 2 Mississippi 31
## 3 Montana 28
## 4 Wyoming 27
## 5 Alabama 26
## 6 New Mexico 25
head(deaths_car_crashes%>%group_by(State)%>%summarize(n=as.integer(max(Deaths)))%>%arrange(desc(n)))
## # A tibble: 6 x 2
## State n
## <chr> <int>
## 1 California 4329
## 2 Texas 3776
## 3 California 3623
## 4 Florida 3543
## 5 Texas 3504
## 6 Florida 3174
summary(deaths_car_crashes)
## State Population Deaths
## Length:663 Min. : 509294 Min. : 15.0
## Class :character 1st Qu.: 1644697 1st Qu.: 223.0
## Mode :character Median : 4293204 Median : 505.0
## Mean : 6105580 Mean : 712.1
## 3rd Qu.: 6829052 3rd Qu.: 937.5
## Max. :39536653 Max. :4329.0
## Deaths_per_100000_Population Year
## Min. : 2.40 Min. :2005
## 1st Qu.: 9.00 1st Qu.:2008
## Median :12.10 Median :2011
## Mean :13.03 Mean :2011
## 3rd Qu.:16.20 3rd Qu.:2014
## Max. :37.90 Max. :2017
Interestingly, most of the deaths occured in Texas, California and Florida, when you look at the maximum deaths per 100,000 Population, Wyoming, Mississippi, and Montana has the highest rates. This could be due to the fact that Texas, California and Florida are large states and so have a high population which would be the reason why the high death numbers were weighted out due to the population size.
We didn’t recreate all the 5 design sheets, however we did create an idea sheet and sketches for the new dataset.
Ideas Sheet #1
All members contributed equally to the design and production processes. Most of the design and coding was performed together with all group members present, therefore assessing specific contributions of each member is difficult. It should be noted, however, that Maria Nakhoul championed the debugging efforts on many of the visualizations, and all team members agree that she deserves recognition for her efforts and success there. Discussion regarding complex debugging and cleaning of the code also happened together.
The Navigation Bar in our visualization that helps us switch between the different tabs in our visualization.
Screenshots #1
The first tab of our visualization focuses on the CDC data for Morto Vehicle Death Rates in 2012 and 2014. CDC data was stratified by Age, Gender and Year, so we were able to create boxplot distributions for the data to visualize the differences between 2012 and 2014. The bar plot represents death rate per 100,000 Population in all the states in increasing order for a specific year. The choropleth map is a visualization of the death rate portrayed in the bar plot.
Screenshots #2
You can click on a boxplot and it will change in color to be highlighted and filter the data based on your selection and even update the side bar panel. Under the boxplots, a label of the boxplot you chosen gets displayed. If you click on a bar in the barplot it will highlight the corresponding state in the choropleth map and vice versa.
Screenshots #3
The second tab of our visualization is the comparison between a reference year choropleth map and a selected map from the small multipled for the Highway Loss Data Institue dataset for the death rates per 100,000 population per state for the years 2005-2017. The small multiples help to visualize the data all at once and make comparions. Under the small multiples, there is a label to select a small multiple to enlarge. After enlarging, it will tell you the year you have chosen to remind you.
Screenshots #4
Once you choose a reference year and a small multiple, the 2 choropleth maps appear side by side for comparison. Reference on the left and small multiple on the right. When you hover over the states in either map, or even the small multiples above, you get a bar plot of the Highway Loss Data Institue dataset on the vehicles involved in the death accident for the years 2005-2017. The death total in the 2 datasets is the same so we were able to make this link between the datasets for this visualization.
Screenshots #5
The third tab was the differences in the death rates between the years to a reference year. You choose a reference year and then the differences between all the years and the reference year (!reference-reference). The blue indicates negative values while the red indicates positive values. Ofcrouse one of the maps will remain red because we didn’t subtract the reference from itself. We were unable to add a colorscale because due to the recursive function creating our small multiples, the colorscales would come up stacked ontop of each other and overlapped so you can’t read the values displayed. The line graph represents the total number of deaths across all states across all the years, and when a reference year is chosen, the point for that year turns red.
Screenshots #6
The fourth tab shows the correlation scatter plot between percentage of seatbelt usage in the states for a reference year and the death rates for that year. The bar plot represents the death count from driving under the influence for a reference year by state. The line graph shows the total deaths caused by driving under the influence in the United States for the years 2009-2017. This tab was created as a way to find some connection between driving under the influence and death numbers even when using seat belts.
Screenshots #7
When you choose a reference year, that specific year point lights up in the line graph. The scatter plot and bar plot are filtered for that year as well. When you click on a bar in the bar plot, the corresponding point in the scatter plot appears as well.
Screenshots #8
Future work could look further into the unexpected correlation plots, mapping color to other potential contributors to motor-vehicle death rates: impaired driving rates, road conditions, etc. Would be interesting to find more data collected by FARS that hasn’t been presented to the public, or even if there is stratification of the data the way the CDC had their data stratified by Age, Gender, and Year.
1- https://www.cdc.gov/motorvehiclesafety/impaired_driving/states-data-tables.html
2- https://www.cdc.gov/motorvehiclesafety/seatbelts/seatbelt_map.html
3- https://data.cdc.gov/Motor-Vehicle/Motor-Vehicle-Occupant-Death-Rate-by-Age-and-Gende/rqg5-mkef
4- https://www.iihs.org/iihs/topics/t/general-statistics/fatalityfacts/state-by-state-overview/2017